Save image (transparent GIF) or link (ICON)


EDF FAQ by Bob Kemp

The official specification of the EDF format was published in 1992 by Bob Kemp, Alpo Värri, Agostinho C. Rosa, Kim D. Nielsen and John Gade as "A simple format for exchange of digitized polygraphic recordings" in Electroencephalography and Clinical Neurophysiology, 82 (1992) 391-393.

Several large multi-national studies, that used EDF as a vehicle to exchange polygraphic data, have demonstrated that the specification is sufficiently simple and leaves enough flexibility to be easily applied in practice. Therefore, any implementation of EDF should simply follow exactly the official specification. However, some questions were regularly asked during these studies. Therefore, this FAQ list may be of some additional help.

Changing your EDF implementation according to any of these answers does not cause any incompatibility with EDF files or software that followed the official specs. Neither would you loose any of the original simplicity or flexibility. Some answers define EDF export more strictly than the official specs do. But EDF import (reader) software should accommodate all options that the official specs leave to the implementor. The list may give you an idea of these options.

EDF was designed in one day and we originally had in mind the exchange of polygraphic recordings between mainly PC's in the old millennium. I suggest that you also abide to the three simple red-color additional guidelines (at Q3, Q7 and Q10), so your EDF can be used all over the world, between any machine and until the year 2084. If you want to use EDF also for the exchange of annotations, events and automatic or manual analysis results, then it is probably wise to adopt the green-color additional guidelines as well,
 

Here is the list of Questions and Answers:

Q1. For text fields in the header, what is the character set to use?
Export. EDF specs say that header information should be coded in ASCII strings. The American Standard Code for Information Interchange (ASCII) is 7 bits wide and consists of control characters (byte values 0..31 and 127, for instance for LineFeed, FormFeed, Carriage Return, Delete) and printable characters (32..126). So, unless you are looking for trouble, use only printable ASCII characters (32..126).
Import. Would an EDF file ask for trouble (that is, contain control characters), EDF readers should not try to execute these. Would an EDF file contain control characters or otherwise illegal characters (127..255), warn the producer of that file.

Q2. Is the correct syntax for the date and time fields DD.MM.YY and hh.mm.ss (D, M, Y, h, m, and s = [0..9]) as in "02.08.51"? I also saw "2.8.51" and " 2. 8.51".
Export. The official specs say "The information in the ASCII strings must be left-justified and filled out with spaces" and "8 ascii : startdate of recording (DD.MM.YY)" and "8 ascii : starttime of recording (hh.mm.ss)". The format does not specify that D, M, Y, h, m and s = [0..9]. Therefore, some may argue that a space or even a blank (null character, 0) is also allowed in the ASCII string. However, using spaces conflicts with the "left-justification" spec and the null character is a 'forbidden' ASCII control character (see Q1). So, my advice is to produce EDF date and time fields containing only characters 0..9 and the period (.) as a separator, for example "02.08.51".
Import. Still, EDF viewers should also accommodate " 2. 8.51" and "2.8.51". And it is probably wise (and not much work) to have them also accommodate different separators, like in 02:08-51 and 02/08'51.

Q3. How about the Y2K millennium problem?
In fact, it is a centennial problem. An EDFdate of "02.08.51" in the "Startdate of Recording" field could specify a recording from 2051, 1951, 1851, 1751, etc. First, it is wise to put the full date in the "local recording identification" field (80 free ASCII's), for instance in the format "Startdate 02-AUG-1951". This also avoids any confusion between American and European date format.
    Next, you can use 1985 as a clipping date. EDF was used for the first time in 1989. At that time, some older recordings from 1985 were also converted to EDF. No EDF was recorded before 1985. Therefore you can use 85 as a clipping date in your EDF software. Or in other words: if the EDFyear (yy=51 in the above example) is equal to or larger than 85, then the real startdate is assumed to be EDFdate + 1900. If the EDFyear is smaller than 85, the real date is assumed to be EDFdate + 2000. In other words, in the EDF startdate, yy=00-84 means yyyy=2000-2084 and yy=85-99 means yyyy=1985-1999.
    This clipping date was discussed and adopted by the Siesta project in 1999 and is also in my viewer PolyMan.

Q4. Are the "digital minimum" and "digital maximum" values hints or strict limits?
The specs say "The digital minimum and maximum of each signal should specify the extreme values that can occur in the data records." Note the word "can". It is not necessary that these values actually DO occur. So take safe values that you know the signal will not exceed, for instance the range of the ADC. Note that "The physical (usually also physiological) minimum and maximum of this signal should correspond to these digital extremes". This correspondence is necessary for assessing gain and offset of the signal.

Q5. Why not always use -32767 for "digital minimum" and +32767 for "digital maximum"?
Export. It is formally correct EDF as long as the purpose (specification of offset and amplification of the signal) is met with sufficient accuracy.

Q6. Which is the preferred method of encoding a channel, where gain = (physical maximum - physical minimum) /(digital maximum - digital minimum) is negative? Using physical minimum > physical maximum or using digital minimum > digital maximum?
Export. The specs say "The digital minimum and maximum of each signal should specify the extreme values that can occur in the data records. These often are the extreme output values of the A/D converter. The physical (usually also physiological) minimum and maximum of this signal should correspond to these digital extremes...". So, just reading this chronologically, first specify digital maximum > digital minimum, then derive the 'corresponding' physical minimum and physical maximum which in this case leads to physical minimum > physical maximum.
Import. Import routines should allow both alternatives because it is not much programming (just get gain and offset) and because someone else may have an interpretation different from mine.

Q7. Are "+22", ".5", "1E3" valid syntax's of number fields?
Yes, as long as the numbers are left-justified in the ASCII strings and filled out with spaces. "22" and "-1.23E-4" are also OK. In the latter example, better accuracy can be obtained by using a standardized dimension prefix. So use "-123.456" and the dimension "uV      " rather than "-1.23E-4" and the dimension "V       ". In accordance with the examples in the original publication and in order to avoid Continental / (American) English confusions, never use a comma "," for a digit grouping symbol, nor for a decimal separator. When a decimal separator is required, use a dot (".") only.

Q8. How to specify signals that can not be calibrated (like an oral-nasal thermocouple for respiration flow, or an event button).
Export. Just set the physical dimension to some meaningless value like "        ". Put appropriate values in the digital minimum/maximum fields and dummy values in physical minimum/maximum fields. Do not make  physical minimum = physical maximum because that may result in 'division by zero' errors in programs, that compute the signal gain from these values.
Import. Some EDF files may not contain valid numbers in the digital/physical minimum/maximum fields, especially when signals were not calibrated. It should still be possible to read these signals, be they uncalibrated.

Q9. Do non-integer sampling frequencies (like 1/30 Hz) cause problems?
Not necessarily. Good viewers will count samples and compare these with "number of samples in a datarecord" and in this way count how many datarecords have been passed (and consequently how many "duration's of a datarecord"). Because this is all integer computation, there are no round-off errors! This is why EDF recommends the "duration of a datarecord" to be an integer number of seconds. In the 1/30 Hz example, "duration of a datarecord" and "number of samples in a datarecord" can be 30 and 1, respectively. Or 3600 and 120, respectively.
    However, if a sampling frequency is 999.98Hz (for instance due to small inaccuracy of the ADC clock), 'integer EDF' would be possible using datarecords of 50000s and 49999 samples of each signal in this datarecord. Even if only one signal is in the file, there would be more than 61440 bytes in a datarecord. The official specs say that in that case the duration should be a float value less than 1s. This will inevitably cause a small round-off error in the timing.
    In an even more extreme example, like 999.999998Hz, it is better to assume it to be 1000Hz (so for instance 1000 samples in a 1s datarecord) because this causes a smaller error than a non-integer duration of the datarecord would.

Q10. Are the 2-byte samples in the data blocks written in big or little endian?
Indeed, the byte order for the integer datasamples is different in (a.o.) Intel and Motorola processors. In the first EDF application, described in the original article, the Intel little endian byte order was applied (see section Results) because we had mainly PC's in mind. That is, the lower-significance byte was stored before (at lower address than) the higher-significance byte: the integer samples were stored "little-end-first". At present (March 1999) probably all EDF files in the world are in the little endian format and certainly all EDF viewers expect so. Let us keep it that way and ask the Motorola users to force the little endian in their routines. Some Sun users already did so in Matlab. So, EDF samples should be stored in the little endian format (the default format in PC applications).

Q11. What are common errors in EDF files?

  • Datablocks larger than 61440 bytes (problem for some viewers).
  • Non-standard ascii characters (byte values 127-255) in the header.
  • Not specifying the number of datarecords. Note that '-1' can only be used before or during the recording. After the completion of a recording, the actual number of datarecords is known and should be specified.
  • Inaccurate signal labels (like EEG abdomen for respiration effort signal).
  • Incorrect transducer type (such as AgAgCl electrode for a rectal temperature probe signal).
  • Incorrect physical dimensions (for instance uV for a respiration signal coming from a thermistor).
  • Inaccurate or simply meaningless calibration values (i.e. physical and digital minimum/maximum), even in EDF files coming from very accurate equipment.
  • Empty prefiltering fields, even when time constants or lowpass filters were applied to the signal in the file (suggestion: let the recording equipment automatically specify something like "Bandpass 0.1-75Hz" in the prefiltering field).
     

    Q12. What are common errors in EDF viewers?

  • Assuming that sampling frequencies are always higher than 1Hz.
  • Assuming that nobody would want to see 24 hour on one screen (for instance an EDF delta or temperature plot).
  • Assuming that EDF files always have the extension .eeg or .edf.
  • Specifying signal gains or time axis using cm or mm like in the paper EEG machines: it is better to use a vertical calibration bar and specify the number of seconds that is on the screen.


    Q13. Do the mentioned EDF-supporting companies really provide correct EDF?
    Since most companies recently started doing EDF, I think it is not fair to tell now. Not all companies provide perfect EDF. So, if you plan to buy EDF equipment, check its EDF files using the Alpo Värri program CHECKREC and one of the free EDF viewers. Or mail me a file and I will do a rough check (this offer is valid until further notice). Tell the supplier to correct any errors. Next year I would like to start, with your cooperation, evaluating EDF companies and list the results on this site.

    Q14. How to encode free-text annotations?
    Simply assign one of the signals in the file by giving it the label "ANNOTATION". Let the 'samples' of this signal in the datarecords store time-stamped annotations in standard ASCII format (see Q1) as follows. If the technician switches off the lights on 17 March 1999 at 23:54:12.2hr, this is stored as the 30-bytes ASCII string '19990317235412.200@Lights off@' without the single quotes. Note that the time stamp has the order year-month-day-hour-minute-second.milliseconds and unknown characters are set to 0 (byte value 48). The .milliseconds may be omitted. Each signal sample offers two bytes of space and each byte contains one of the standard-ASCII byte values. Each annotation is stored byte-by-byte without changing their order. Unused bytes, in between the annotations, get byte value 0.
        Typically, annotations denote events that occurred in the datarecord in which they are written. Only if this is not possible because they are too many or too large, they can overflow in earlier and/or later datarecords. Annotations must be in the file in chronological order. If two annotations have identical time stamps, their order is arbitrary. Any annotations that denote events that occurred in preceding datarecords must immediately follow the preceding annotation: there must be no 0-valued bytes between them. Any annotations that denote events that occurred in later datarecords must immediately precede the following annotation: there must be no 0-valued bytes between them.
        Choose the sampling frequency of the ANNOTATION signal high enough to accommodate the amount of text bytes to be stored. For instance, a sampling frequency of 5Hz accommodates 5*60*2=600 characters per minute. Since time stamp and text wrapper (yyyymmddhhmmss.mmm@@) take 20 characters, this sampling frequency allows an average of 6 lines of 80 characters or 10 lines of 40 characters per minute. This shows that an annotation sampling frequency of 5Hz is quite sufficient for most neurophysiological applications. Apparently, the annotations do not occupy much space when compared to the recorded signals.
        The main idea of this scheme was described by Maarten van de Velde et al in the article "Digital archival and exchange of events in a simple format for polygraphic recordings with application in event related potential studies". This article describes still more possibilities for the encoding of events and was published in the J of Electroenc Clin Neurophysiol 106, 1998:547-551.
        They suggest to use the ISO-8859-1 (Latin1) extended ASCII coding scheme. But I prefer to use only the genuinely standard ASCII characters that are also allowed in the header (see Q1). I suggest you do the same because it avoids the ISO-8859 alphabet discussions between countries. The only languages that can comfortably be written with the repertoire of standard ASCII are Latin, Swahili, Hawaiian and American English. This is OK because EDF should support international exchange of data and it does not make much sense to send Arabic or Portugese text to Chinese or Finnish colleagues. In order to disprove any suggestion that EDF is for Europeans only, I would suggest to use American English rather than Latin.

    Q15. How to encode physiological events such as apneas and leg movements?
    Simply use the annotations encoding in an (possible additional) ANNOTATION signal as described at Q14. Use standard texts 'Apnea onset', 'Apnea end', 'Leg movement onset' and Leg movement end'. For instance, an apnea with a duration of 35s and starting at 03:47:17.900 would be encoded as two events:
    00000000034717.900@Apnea onset@ and 00000000034752.900@Apnea end@. In this way, information about the onset and duration of the event can be recovered from the annotations. It is important to use standard texts, so automatic processing of the events is possible.

    Q16. How to store analysis results in EDF?
    Any automatic or manual analysis result that is again a single or multi-channel timeseries (for instance a deltaplot together with an automatically scored hypnogram) can easily be stored in an EDF file. Some experience and discussions in the COMAC-BME and Siesta groups resulted in the following guidelines
    1. The analysis result should be stored in a separate EDF file. In order to reliably link the analysis file to the originally recorded file, the analysis program should:
    - make the name part of the two filenames identical
    - make the extension part of the two filenames different.
    - copy the patient-id line (80 characters) from the header of the recorded file to the header of the analysis file.
    - preferrably start the analysis at the exact beginning of the originally recorded file and let the program simply copy startdate and starttime from the originally recorded file into the analysis file. If there are good arguments not to start the analysis at the start of the recording, then at least make the timing of the analysis file (that is startdate, starttime, number and duration of the datarecords) correspond to the timing of the recorded file. So, if you analyse a portion from 23:05:00 till 23:25:00 of the original recording that was made on August 2, 1999, then the analysis file should have startdate 02.08.99 and starttime 23.05.00. Number and duration of the analysis-date records can be chosen according to the EDF guidelines and the applied smoothing windows. If, for example, your analysis-data records each refer to 30s of the recording, the mentioned analysis file should have 40 of these datarecords.
        In this way it is clear that both files refer to one time period in one person's life. Some EDF viewers (like PolyMan) are capable of showing the two (or more) files time-synchronized on one screen.
    2. Apply suitable scaling factors in such a way that a large part of the available range of -32767 till 32767 for the values of the analysis results is used. Put these scaling factors in the header (digital and physical minimum and maximum) of the analysis file. If necessary, the scaling factor can be adapted to the dynamic range of the analysis result, after the analysis was done.
    2b. If solution 2 is really really impossible because the usefull dynamic range of the analysis result is too large, but only then, apply the standardized logarithmic transformation to store floating point values in EDF. However, be aware that viewers, that do not yet accomodate the appropriate exponential inverse scaling, can only show the results on a logarithmic scale. So really try solution 2 first!
    3. If the analysis contains a hypnogram, sleep stages W,1,2,3,4,R,M should be coded in the datablocks as the integer numbers 0,1,2,3,4,5,6 respectively. Unscored epochs should be coded as the integer number 9.
    4. Automatically document the analysis principle and parameters in the Recording-id, Label, Transducer type, Physical dimension and Prefiltering fields in the header of the analysis file.

    Q17. Should the starttime of the recording be in local time or for instance in Greenwich Mean Time?
    Everybody until now (2000) uses local time, so I suggest that you do the same.

    Q18. Are there any standard texts for the EDF ascii fields?
    With the help of several colleagues, I constructed some standard texts.
        These texts comply with the official specs and therefore do not cause any incompatibility with EDF software. EDF import (reader/browser, analysis) software should abide by the official specs and not depend on these standard texts. However, if the software detects that the imported file does contain standard texts, it can automatically recognize labels and dimensions.
        Using standard texts is not required for EDF compatibility. However, they reduce the probability for errors and avoid the need for user input in some types of automatic analysis programs. Therefore, it is wise to use the standard texts wherever possible.

    Q19. Can EDF store hypnograms?
    Yes, of course. Simply consider that a hypnogram is a single signal of 1 sample per 30s (or in some labs per 20s). For instance, all 1770 hypnograms made in the Siesta project are stored in an EDF file. The sleep stages W, 1, 2, 3, 4, R, MT and 'unscored' were coded in the EDF files as integer numbers 0, 1, 2, 3, 4, 5, 6 and 9, respectively. The EDF recording of an OSAS patient contains not only the polygraphic signals but also the hypnogram as one of the signals.




    Back to Bob Kemp